A Perspective of Evolution After Five Years: A Large-Scale Study of Web Spam Evolution

نویسندگان

  • De Wang
  • Danesh Irani
  • Calton Pu
چکیده

Identifying and detecting web spam is an ongoing battle between spam-researchers and spammers which has been going on since search engines allowed searching of web pages to the modern sharing of web links via social networks. A common challenge faced by spam-researchers is the fact that new techniques depend on requiring a corpus of legitimate and spam web pages. Although large corpora of legitimate web pages are available to researchers, the same cannot be said about web spam or spam web pages. In this paper, we introduce the Webb Spam Corpus 2011 — a corpus of approximately 330,000 spam web pages — which we make available to researchers in the fight against spam. By having a standard corpus available, researchers can collaborate better on developing and reporting results of spam filtering techniques. The corpus contains web pages crawled from links found in over 6.3 million spam emails. We analyze multiple aspects of this corpus including redirection, HTTP headers, web page content, and classification evaluation. We also provide insights into changes in web spam since the last Webb Spam Corpus was released in 2006. These insights include: (1) spammers manipulate social media in spreading spam; (2) HTTP headers and content also change over time; (3) spammers have evolved and adopted new techniques to avoid the detection based on HTTP header information.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluation of the Effect of Health Sector Evolution Plan on the Rate of Cesarean Sections in Hospitals Affiliated to Abadan School of Medical Sciences

Introduction and purpose: One of the health sector evolution plan's goals is to reduce the rate of cesarean section (C-section) and increase the natural childbirth rate. Therefore, this study aimed to evaluate the rate of C-sections before and after the implementation of the health sector evolution plan in the hospitals affiliated to Abadan School of Medical Sciences, Abadan, Iran. Methods: Th...

متن کامل

Obstetricians' perspective on the Health Section Evolution Plan in Iran: A Quality-Case Study

Background and Objectives: The Increase of unnecessary caesarean sections has become one of the serious concerns in some health systems. One of the seven packages of the health Reform Plan that was sent to all Iranian medical universities in 2014 was the "Promoting Natural Delivery (vaginal births)," It emphasized the need to reduce cesarean delivery and promot...

متن کامل

Pareto Optimal Balancing of Four-bar Mechanisms Using Multi-Objective Differential Evolution Algorithm

Four-bar mechanisms are widely used in the industry especially in rotary engines. These mechanisms are usually applied for attaining a special motion duty like path generation; their high speeds in the industry cause an unbalancing problem. Hence, dynamic balancing is essential for their greater efficiency. In this research study, a multi-objective differential evolution algorithm is used for P...

متن کامل

Assessment of Distribution of Nursing Staff in Hospitals affiliated to the Ministry of Health and Medical Education before and after the Implementation of the Health System Evolution Plan

Introduction: The distribution and productivity of human resources in the health sector have a significant role to play in providing health-care services and services to the population covered. The purpose of this study was to determine the distribution of nursing staff in hospitals affiliated to the Ministry of Health and Medical Education before and after the implementation of the Health Sys...

متن کامل

Fuzzy logic controlled differential evolution to solve economic load dispatch problems

In recent years, soft computing methods have generated a large research interest. The synthesis of the fuzzy logic and the evolutionary algorithms is one of these methods. A particular evolutionary algorithm (EA) is differential evolution (DE). As for any EA, DE algorithm also requires parameters tuning to achieve desirable performance. In this paper tuning the perturbation factor vector of DE ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Int. J. Cooperative Inf. Syst.

دوره 23  شماره 

صفحات  -

تاریخ انتشار 2014